Introduction to R

Session 4

Session Overview

  1. Inputs and Outputs
  2. Graphics
  3. Bonus: Advanced Graphics with ggplot

Today

Nalan Bastürk

  • Associate Professor at QE
  • Research interests: econometrics, Bayesian statisics, financial econometrics
  • Website
  • Sessions 4 and 5

Stephan Smeekes

  • Professor of Econometrics at QE
  • Research interests: econometrics, time series, high-dimensional statisics, bootstrap, macro- and climate econometrics
  • Website
  • Sessions 2, 3 and 4

Inputs and outputs of your R session

  • Common inputs / outputs of an R session are datasets or R scripts.
  • For this meeting we focus on datasets as inputs to the R session, loading data and saving data.
  • The file format (csv, Rdata, xls, stata …) and directory of the files are important keep in mind.
  • We will go over a few options, potential issues, and how to avoid the need to type in a long directory name when managing the input and output of the R session.

Working directories and R projects

  • Whenever we provide R with a file name, it can include the full path on the computer.
  • An alternative is to work on a specified directory.
  • Another alternative is to work within a ‘project’ that all paths are visible to the project scripts.
  • If we do not provide any path, R will use the current “working directory” for reading or writing files. It can be obtained by the command
  getwd()

Using the correct directory to get input / output of the R session

  • Navigating through the menus in RStudio is easy, (click and go) but requires using the menu every time the user runs the code.

  • Go to Session -> Set Working Directory. Two convenient options are:

    • Choose Directory…: Choose the directory yourself

    • To Source File Location: Set the working directory to the directory where your R Script (the source file) is saved

Using the correct directory to get input / output of the R session

  • An alternative is to use function setwd() at the beginning of your script. This line then has to be changed when the code runs in another machine.
 setwd("~/R_training")

Using the correct directory to get input / output of the R session

  • Recall: To set the working directory to the folder where your current R script is located, you can simply use:
library(this.path)
setwd(here())
  • Recall: We could also explicitly make the function call from the library:
setwd(this.path::here())

Types of input or data that can be loaded in R

R interacts with files in several ways.

  • You can load, save, import, or export a data file.
  • You can save a generated figure as a graphics file or store regression tables as text, spreadsheet, or LATEX tables.
  • You can load, save the full workspace (environment) you are working with to follow up another time.

Datasets can come in different formats.

  • Rdata files: Files that can directly
  • Other file formats (SPSS csv, xls, …) are also possible to load in R. This often requires the use of packages

Loading Rdata files

  • Rdata files are specific to R file formats.
  • They can store a single object or several objects.
  • These files are the easiest to manage as input or output in R, since they don’t require library calls.

Load climate data from RData format:

  • load function is used to load data in Rdata format.
  • load function loads all objects in the input Rdata file.

Example data: average temperatures for Maastricht and Eindhoven during 2024, every 2nd day of the month

load("data/climate_long.Rdata")
print(long_data)
             NAME MONTH TEMP
99091   EINDHOVEN     1 10.6
99122   EINDHOVEN     2  7.1
99151   EINDHOVEN     3 10.2
99178   EINDHOVEN     4  8.9
99207   EINDHOVEN     5 18.5
99238   EINDHOVEN     6 15.0
99268   EINDHOVEN     7 15.0
99299   EINDHOVEN     8 20.7
99329   EINDHOVEN     9 22.6
99359   EINDHOVEN    10 10.0
99390   EINDHOVEN    11 10.5
99415   EINDHOVEN    12  9.7
99801  MAASTRICHT     1  9.7
99832  MAASTRICHT     2  5.9
99861  MAASTRICHT     3  9.9
99888  MAASTRICHT     4  9.0
99917  MAASTRICHT     5 15.7
99948  MAASTRICHT     6 14.1
99978  MAASTRICHT     7 14.9
100009 MAASTRICHT     8 20.5
100039 MAASTRICHT     9 21.6
100069 MAASTRICHT    10 10.3
100100 MAASTRICHT    11 10.2
100125 MAASTRICHT    12  9.2

Notice the data are loaded as a dataframe:

is.data.frame(long_data)
[1] TRUE
typeof(long_data)
[1] "list"

There is a distinction between a long and wide dataframe:

load("data/climate_wide.Rdata")
print(wide_data)
   MONTH EINDHOVEN MAASTRICHT
1      1      10.6        9.7
2      2       7.1        5.9
3      3      10.2        9.9
4      4       8.9        9.0
5      5      18.5       15.7
6      6      15.0       14.1
7      7      15.0       14.9
8      8      20.7       20.5
9      9      22.6       21.6
10    10      10.0       10.3
11    11      10.5       10.2
12    12       9.7        9.2

Loading other formats of data in R

Option 1: Using menus within RStudio is the easiest (click and go) but requires using the menu every time the user runs the code.

Loading other formats of data in R

Option 1: Using menus within RStudio (cont’d)

Loading other formats of data in R

Option 1: Using menus within RStudio (cont’d)

Loading other formats of data in R

Option 1: Using menus within RStudio (cont’d)

Loading other formats of data in R

Advice for option 1:

  • Copy the command that appears after loading the data from the menus.

Loading other formats of data in R

Advice for option 1 (cont’d):

  • Paste the command on top of your script.
  • This way, next time you do not need the menu navigation.
library(readxl)
climate <- read_excel("data/climate_wide.xlsx")
  • You can view the data by clicking on it in the `Environment’ at the top-right of the workspace.

General way of importing and exporting of other data formats

  • Using the correct libraries for different data formats can be tedious.
  • R package rio is very convenient for data import and export. It figures out the type of data format from the file name extension, e.g. .csv for CSV, .dta for Stata, or *.sav for SPSS data sets
  • For a complete list of supported formats, see help(rio).
  • It calls an appropriate package to do the actual importing or exporting.

Loading SPSS and other file types

library('rio')
import("data/climate_wide.dta")

Loading csv files

import("data/climate.csv")

Outputs

  • Outputs work very similarly to the inputs above.
  • The most relevant outputs formats are the R output formats.
  • save() saves objects as an .RData file.
  • save.image() saves a selection of objects as an .RData file.

Exercise 4.1: Saving data

  • Save your current workspace using function save.image().
  • Save only one variable in the workspace using function save().
  • Make a list of two variables from long_data, and save this list using function save.

R Base Graphics

  • We will cover R base graphics.
  • Other alternatives include `ggplot2’…

To create plots with R’s standard graphics package, there are high-level and low-level plotting functions.

  • High-level functions generate a new graphic (and open a device).
  • Low-level functions add elements to an existing graphic.

Simple plots

plot(long_data$TEMP)     ## Plotting a single variable

plot(x = long_data$MONTH, y = long_data$TEMP)     ## Scatter plot wrt month

Multiple plots

par(mfrow = c(1,2)) # multiple plots in a row
plot(long_data$TEMP)     ## Plotting a single variable
plot(x = long_data$MONTH, y = long_data$TEMP)     ## Scatter plot wrt month

Functions calling methods

Notice that function plot() calls methods.

It will perform different operations depending on the class of the passed object. (We study the lm() function in detail in the next session!)

ols_result <- lm(TEMP~MONTH, data = long_data)
plot(ols_result)

Creating and saving a graph

pdf("figures/plot_data_short.pdf")
hist(wide_data$MAASTRICHT, breaks = 20)
dev.off()

Customizing Graphics

  • Adding points to an existing plot
  • Function `dev.off()’is called after all the plotting, to save the file and return control to the screen.
plot(wide_data$MAASTRICHT) # temperatures in Maastricht
lines(wide_data$EINDHOVEN) # temperatures in Eindhoven in lines

Customizing Graphics

  • The plot() function takes several many arguments that can change the layout of the plots. See ?par for all graphical options; there are many!

  • Some examples:

    • col: color of lines / points
    • lty, lwd: Line type and thickness
    • pch: Point type (1-16)
    • main, sub: Title, subtitle
    • xlab, ylab: x and y axis labels
    • log, xlog and ylog for logarithmic scales
    • xlim, ylim: x and y axis limits (for overriding R’s default choices)
    • mfcol, mfrow: Multiple plots in one graphics window (column-wise/row-wise)

Low-Level Graphic Functions

  • lines: Draw lines
  • abline: Quickly add horizontal, vertical lines, and lines using equation \(y = bx + a\)
  • points: Add points
  • arrows: Add arrows
  • title: Add a title
  • legend: Add a legend
  • text: Add text at \((x,y)\) coordinates
  • mtext: Add text with positional specification like side=1,...,4

Exercise 4.2: Plot maximum temperatures for Maastricht

We want to visualize the daily maximum temperatures in the climate data specifically for Maastricht. First, make a basic plot of temperatures in Maastricht then customise the plot in the following ways:

  1. The title of the X-axis should say ‘Month’, the title of the Y-axis ‘Average Temperature’.

  2. Make the plot a line plot with a blue line. (Hint: specifying the colour literally as "blue" works)

  3. Make the tick marks appear on the inside of the figure rather than the outside.

  4. Calculate the average temperature.

  5. Add a horizontal line with the average maximum temperature

You will need to consult the help file for this exercise; see this therefrom more as an exercise in how to navigate R’s help system, than an exercise in plotting (which we will cover in more detail later).

You may want to ask ChatGPT for help and then try to see if you could also have gotten the same answer yourself; it may not always give you the most straightforward answer though!

Manually saving R plots

  • Use the plot functions without creating a graph.
  • Use the `plots’ area to save image manually.

Different plot types

You can manually save graphs of several formats.

Best practice is to save a graph through a device such as pdf or similar:

  • pdf(): Adobe PDF (easily integrated into LaTeX).
  • svg(): Scalable Vector Graphics (commonly used on websites).
  • png(), jpeg(), tiff(), bmp(): Various bitmap formats.
jpeg("figures/MaasTemperature.jpeg")
plot(x = wide_data$MAASTICHT, y = wide_data$MONTH)
dev.off()

Bonus: Advanved Graphics using ggplot

Exercise 4.3: Use other packages for plots

  • Make the same plot in R using package `ggplot2’
  • You may want to use ChatGBT since the syntax of `ggplot2’ is quite different from what we covered so far.